El conjunto de datos ha sido tomado de [data.world] (https://data.world/martinchek/2012-2016-facebook-posts) y pertenecen a un [artículo] (https://shift.newco.co/2016/11/09/What-I-Discovered-About-Trump-and-Clinton-From-Analyzing-4-Million-Facebook-Posts/) que analizó 4 millones de publicaciones de Facebook para observar que información había sobre Donald Trump y Hilary Clinton.
Por ello, el conjunto contiene publicaciones de Facebook escritos entre 2012 y 2016 por 15 de los principales medios de comunicación.
Los datos se encuentran agrupados (en csv) por medio de comunicación, por lo que para este caso solo he escogido 5 de esos medios para reducir los costes que puede tener analizar un conjunto de datos tan grande.
Por tanto, el dataset generado se trata de un conjunto de 154.173 filas, cada una de las cuales corresponde a una publicación de Facebook realizada por algunas de las siguientes páginas:
BBC (id 228735667216): The British Broadcasting Corporation.
CNN (id 5550296508): Multinational cable news channel headquartered in Atlanta, Georgia, U.S.
Fox (id 15704546335): The Fox News Channel.
ABC (id 86680728811): ABC News is the news division of the American broadcast network ABC.
LA Times (id 5863113009): Los Angeles Times: News from California, the nation and world.
X id page_id
1 1 228735667216_10151160265682217 228735667216
2 2 228735667216_10151161802472217 228735667216
3 3 228735667216_10151162236982217 228735667216
4 6 228735667216_10151799077647217 228735667216
5 7 228735667216_10151948038612217 228735667216
6 8 228735667216_10151948251032217 228735667216
name
1 BBC News Photos
2 US congress in final push to reach 'fiscal cliff' deal
3 BBC News Photos
4 Timeline Photos
5 Timeline Photos
6 Timeline Photos
message
1 Your best photographs of 2012. GALLERY: http://bbc.in/UvZxmS
2 London's leading shares have fallen amid fears that budget talks won't stop the US sliding over the 'fiscal cliff'. http://bbc.in/Uz888cAre you worried about the impact to the global economy, if US senators don't reach a deal before New Year's Day? http://bbc.in/Uz86x7
3 In Pictures: New Year's celebrations from around the world, some dazzling, others poignant and one that's a first http://bbc.in/130CWBE
4 Do you think your life will get better or worse, or stay the same, in 2014? http://bbc.in/KeY2bO Nearly 50% of people believe things will improve for them in the coming year, according to a global survey conducted annually since 1977.How far do you agree with the findings? And how would you describe your year, in a nutshell?
5 China's President Xi Jinping is on his way to the Netherlands with a 200-strong business delegation on his first trip to Europe as leader http://bbc.in/1ra3ptNHis tour is expected to be dominated by trade, but Mr Xi is also likely to face pressure from Western powers to be firmer with Russia over its actions in Ukraine.
6 The letter Malaysia's acting Transport Minister Hishammuddin Hussein was handed during the #MH370 press conference http://bbc.in/1oJICLr
description post_type status_type likes_count comments_count
1 not provided photo added_photos 242 6
2 link published_story 39 56
3 not provided photo added_photos 545 55
4 not provided photo added_photos 824 470
5 not provided photo added_photos 601 109
6 not provided photo added_photos 2230 247
shares_count love_count wow_count haha_count sad_count thankful_count
1 45 0 0 0 0 0
2 0 0 0 0 0 0
3 133 0 0 0 0 0
4 163 0 0 0 0 0
5 61 0 0 0 0 0
6 352 0 0 0 0 0
angry_count posted_at date hour page_name
1 0 2012-12-30 2012-12-30 09:16:36 bbc
2 0 2012-12-31 2012-12-31 14:16:43 bbc
3 0 2012-12-31 2012-12-31 20:13:56 bbc
4 0 2013-12-30 2013-12-30 09:00:50 bbc
5 0 2014-03-22 2014-03-22 06:17:57 bbc
6 0 2014-03-22 2014-03-22 11:18:40 bbc
X id page_id name
Min. : 1 Length:135544 Min. : 5550296508 Length:135544
1st Qu.: 38929 Class :character 1st Qu.: 5863113009 Class :character
Median : 75102 Mode :character Median : 15704546335 Mode :character
Mean : 75797 Mean : 60085410520
3rd Qu.:113203 3rd Qu.: 86680728811
Max. :150983 Max. :228735667216
message description post_type status_type
Length:135544 Length:135544 Length:135544 Length:135544
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
likes_count comments_count shares_count love_count
Min. : 1 Min. : 0.0 Min. : 0 Min. : 0.0
1st Qu.: 568 1st Qu.: 82.0 1st Qu.: 88 1st Qu.: 0.0
Median : 1745 Median : 226.0 Median : 282 Median : 0.0
Mean : 5663 Mean : 795.6 Mean : 1573 Mean : 132.5
3rd Qu.: 4678 3rd Qu.: 673.0 3rd Qu.: 972 3rd Qu.: 9.0
Max. :1155249 Max. :770366.0 Max. :1934157 Max. :213998.0
wow_count haha_count sad_count thankful_count
Min. : 0.00 Min. : 0.00 Min. : 0 Min. : 0.0000
1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0 1st Qu.: 0.0000
Median : 0.00 Median : 0.00 Median : 0 Median : 0.0000
Mean : 58.12 Mean : 63.62 Mean : 105 Mean : 0.1321
3rd Qu.: 12.00 3rd Qu.: 4.00 3rd Qu.: 3 3rd Qu.: 0.0000
Max. :45708.00 Max. :45122.00 Max. :149098 Max. :1541.0000
angry_count posted_at date hour
Min. : 0.00 Length:135544 Length:135544 Length:135544
1st Qu.: 0.00 Class :character Class :character Class :character
Median : 0.00 Mode :character Mode :character Mode :character
Mean : 90.87
3rd Qu.: 3.00
Max. :117430.00
page_name
Length:135544
Class :character
Mode :character
135544
5
767605124
21
21
21
21
22
21
---
title: "HR Movement dashboard"
output:
flexdashboard::flex_dashboard:
orientation: rows
vertical_layout: fill
source_code: embed
---
```{r setup, include=FALSE}
# Dashboard
library(flexdashboard)
# Data manipulation
library(tidyverse) # data manipulation & plotting
library(stringr) # text cleaning and regular expressions
library(tidytext) # provides additional text mining functions
library(textdata)
library(dplyr)
# Plots
library(viridis)
library(wordcloud2)
#devtools::install_github("gaospecial/wordcloud2")
#library("gaospecial/wordcloud2")
library(RColorBrewer)
library(tm)
library(ggplot2)
library(emojifont)
library(plotly)
library(gridExtra)
# PieChart
library(lessR)
# Network
library(igraph)
library(ggraph)
library(ggiraph)
#devtools::install_github("dgrtwo/drlib")
```
```{r}
data <- read.csv("facebook_dataset.csv", header=TRUE, stringsAsFactors = FALSE)
data$page_name<-""
data[data$page_id=="86680728811",]$page_name<-"abc"
data[data$page_id=="228735667216",]$page_name<-"bbc"
data[data$page_id=="5550296508",]$page_name<-"cnn"
data[data$page_id=="15704546335",]$page_name<-"fox"
data[data$page_id=="5863113009",]$page_name<-"laTimes"
```
Dataset {data-icon="fa-table"}
=============================
### Descripción del dataset
El conjunto de datos ha sido tomado de [data.world] (https://data.world/martinchek/2012-2016-facebook-posts) y pertenecen a un [artículo] (https://shift.newco.co/2016/11/09/What-I-Discovered-About-Trump-and-Clinton-From-Analyzing-4-Million-Facebook-Posts/) que analizó 4 millones de publicaciones de Facebook para observar que información había sobre Donald Trump y Hilary Clinton.
Por ello, el conjunto contiene publicaciones de Facebook escritos entre 2012 y 2016 por 15 de los principales medios de comunicación.
Los datos se encuentran agrupados (en csv) por medio de comunicación, por lo que para este caso solo he escogido 5 de esos medios para reducir los costes que puede tener analizar un conjunto de datos tan grande.
Por tanto, el dataset generado se trata de un conjunto de 154.173 filas, cada una de las cuales corresponde a una **publicación de Facebook** realizada por algunas de las siguientes páginas:
- **BBC** (id 228735667216): The British Broadcasting Corporation.
- **CNN** (id 5550296508): Multinational cable news channel headquartered in Atlanta, Georgia,
U.S.
- **Fox** (id 15704546335): The Fox News Channel.
- **ABC** (id 86680728811): ABC News is the news division of the American broadcast network ABC.
- **LA Times** (id 5863113009): Los Angeles Times: News from California, the nation and world.
### Previsualización de los datos
```{r}
head(data)
```
### Resumen de los campos
```{r}
summary(data)
```
General {data-icon="fa-chart-line"}
=============================
Row {data-width=150}
--------------------------------------
### Número de Publicaciones
```{r}
total_post <- nrow(data)
valueBox(value = total_post,icon = "fa-facebook",caption = "Número de Publicaciones",color = "#C1FFC1")
```
### Número de páginas
```{r}
total_pages <- length(unique(data$page_id))
valueBox(value = total_pages,icon = "fa-file",caption = "Número de páginas", color = "#FFD700")
```
### Número de likes
```{r}
likes_total <- sum(data$likes_count)
valueBox(value = likes_total,icon = "fa-thumbs-up",caption = "Número de likes", color = "#FFEC8B")
```
Row {data-width=150}
--------------------------------------
### Número de enlaces
```{r}
link_post <- length(data[data$post_type == "link",])
valueBox(value = link_post,icon = "fa-link",caption = "Número de enlaces",color = "#EEEEE0")
```
### Número de vídeos
```{r}
video_post <- length(data[data$post_type == "video",])
valueBox(value = video_post,icon = "fa-video",caption = "Número de vídeos", color = "#BF3EFF")
```
### Número de canciones
```{r}
music_post <- length(data[data$post_type == "music",])
valueBox(value = music_post,icon = "fa-music",caption = "Número de canciones", color = "#7FFFD4")
```
Row {data-width=150}
--------------------------------------
### Número de eventos
```{r}
event_post <- length(data[data$post_type == "event",])
valueBox(value = event_post,icon = "fa-calendar",caption = "Número de eventos",color = "#FFA07A")
```
### Número de fotos
```{r}
photo_post <- length(data[data$post_type == "photo",])
valueBox(value = 22,icon = "fa-image",caption = "Número de fotos", color = "#FF3030")
```
### Número de comentarios
```{r}
comments_total <- sum(data$comments_count)
valueBox(value = music_post,icon = "fa-comment",caption = "Número de comentarios", color = "#E0FFFF")
```
Row {data-height=500}
----------------------------------
<div style="height:500px">
### Número de publicaciones según el tipo
```{r}
data_post_type <- data %>% dplyr::group_by(post_type)
plot <- ggplot(data = data_post_type, aes(x = post_type, fill=post_type)) +
geom_bar() +
labs(x = "Tipo de publicación", y = "Nº publicaciones")
ggplotly(plot)
```
</div>
### Número de publicaciones por página y tipo
```{r}
data_by_page_type <- data %>% dplyr::group_by(page_name, post_type) %>%
dplyr::summarise(total = n())
# Small multiple
plot <- ggplot(data_by_page_type, aes(fill=post_type, y=total, x=page_name)) +
geom_bar(position="stack", stat="identity") +
scale_fill_viridis(discrete = T) +
xlab("Página") +
ylab("Nº publicaciones")
ggplotly(plot)
```
Row
----------------------------------
<div style="height:500px">
### Agrupación por año
```{r}
data_month <- data %>% dplyr::group_by(lubridate::month(date),lubridate::year(date)) %>%
dplyr::summarise(total = n()) %>% dplyr::arrange(desc(total))
colnames(data_month) <- c("month", "year", "total")
data_month$month <- as.character(data_month$month)
data_month$year <- as.character(data_month$year)
data_month$month <- factor(data_month$month, levels = c("1","2","3","4","5","6", "7","8","9", "10", "11", "12"), labels = c("01","02","03","04","05","06", "07","08","09", "10", "11", "12"))
data_month <- data_month[order(data_month$month),]
plot <- ggplot(data=data_month, mapping=aes(x=month, y=total, shape=year, color=year)) +
geom_point() +
geom_line(aes(group = 1)) +
facet_grid(facets = year ~ ., margins = FALSE) + theme_bw() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), plot.title = element_text(hjust = 0.5)) +
ggtitle("Número de publicaciones por mes y año") +
xlab("Mes") +
ylab("Nº publicaciones")
ggplotly(plot)
```
</div>
<div style="height:500px">
### Visión global
```{r}
plot <- ggplot(data=data_month, mapping=aes(x=month, y=total, shape=year, color=year)) +
geom_point() +
geom_line(aes(group = 1))+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), plot.title = element_text(hjust = 0.5))+
ggtitle("Número de publicaciones por mes y año") +
xlab("Mes") +
ylab("Nº publicaciones")
ggplotly(plot)
```
</div>
Row
----------------------------------
<div style="height:600px">
### Número de publicaciones por página y año
```{r}
data_by_page <- data %>% dplyr::group_by(page_name, lubridate::year(date)) %>%
dplyr::summarise(total = n()) %>% dplyr::arrange(desc(total))
colnames(data_by_page) <- c("page", "year", "total")
data_by_page <- data_by_page[!is.na(data_by_page$page),]
data_by_page$page <- as.character(data_by_page$page)
plot <- ggplot(data_by_page, aes(year, page)) +
geom_point(aes(size=total), colour = "red") +
scale_color_manual(values=c("black", "dodgerblue")) +
ggtitle("Publicaciones por página y año") +
theme(legend.position = 'none', plot.title = element_text(hjust = 0.5)) +
xlab("Año") +
ylab("Página")
ggplotly(plot)
```
</div>
Reacciones {data-icon="fa-comments"}
===
Row
-----------------------------------------------------------------------
```{r}
emojis <- c(#emoji("email"),# share
#emoji("thumbsup"),# like
emoji("heart"),# love
emoji("joy"),# haha
emoji("cry"),# sad
emoji("rage"),# angry
emoji("innocent")# thankfull
)
#names <- c("shares","likes","love","haha","sad","angry", "thankfull")
names <- c("love","haha","sad","angry", "thankfull")
```
<div style="height:400px">
### Reacciones BBC
```{r}
data_bbc <- tibble(names = names,
emoji = emojis,
total = c(#sum(data$shares_count[data$page_name=="bbc"]),
#sum(data$likes_count[data$page_name=="bbc"]),
sum(data$love_count[data$page_name=="bbc"]),
sum(data$haha_count[data$page_name=="bbc"]),
sum(data$sad_count[data$page_name=="bbc"]),
sum(data$angry_count[data$page_name=="bbc"]),
sum(data$thankful_count[data$page_name=="bbc"])
))
plot <- ggplot(data_bbc, aes(names, total, label = emoji, fill=names)) +
geom_bar(stat = "identity") +
geom_text(family = "EmojiOne", size = 6, vjust = -.5) +
scale_x_discrete(breaks = data_bbc$names, labels = data_bbc$emoji) +
ggtitle("Reacciones a las publicaciones de BBC") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
xlab("Reacción")
ggplotly(plot)
```
</div>
### Reacciones CNN
```{r}
data_cnn <- tibble(names = names,
emoji = emojis,
total = c(#sum(data$shares_count[data$page_name=="cnn"]),
#sum(data$likes_count[data$page_name=="cnn"]),
sum(data$love_count[data$page_name=="cnn"]),
sum(data$haha_count[data$page_name=="cnn"]),
sum(data$sad_count[data$page_name=="cnn"]),
sum(data$angry_count[data$page_name=="cnn"]),
sum(data$thankful_count[data$page_name=="cnn"])
))
plot <- ggplot(data_cnn, aes(names, total, label = emoji, fill=names)) +
geom_bar(stat = "identity") +
geom_text(family = "EmojiOne", size = 6, vjust = -.5) +
scale_x_discrete(breaks = data_cnn$names, labels = data_cnn$emoji) +
ggtitle("Reacciones a las publicaciones de CNN") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
xlab("Reacción")
ggplotly(plot)
```
### Reacciones ABC
```{r}
data_abc <- tibble(names = names,
emoji = emojis,
total = c(#sum(data$shares_count[data$page_name=="bbc"]),
#sum(data$likes_count[data$page_name=="bbc"]),
sum(data$love_count[data$page_name=="abc"]),
sum(data$haha_count[data$page_name=="abc"]),
sum(data$sad_count[data$page_name=="abc"]),
sum(data$angry_count[data$page_name=="abc"]),
sum(data$thankful_count[data$page_name=="abc"])
))
plot <- ggplot(data_abc, aes(names, total, label = emoji, fill=names)) +
geom_bar(stat = "identity") +
geom_text(family = "EmojiOne", size = 6, vjust = -.5) +
scale_x_discrete(breaks = data_abc$names, labels = data_abc$emoji) +
ggtitle("Reacciones a las publicaciones de ABC") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
xlab("Reacción")
ggplotly(plot)
```
Row
-----------------------------------------------------------------------
<div style="height:400px">
### Reacciones The Fox
```{r}
data_fox <- tibble(names = names,
emoji = emojis,
total = c(#sum(data$shares_count[data$page_name=="fox"]),
#sum(data$likes_count[data$page_name=="fox"]),
sum(data$love_count[data$page_name=="fox"]),
sum(data$haha_count[data$page_name=="fox"]),
sum(data$sad_count[data$page_name=="fox"]),
sum(data$angry_count[data$page_name=="fox"]),
sum(data$thankful_count[data$page_name=="fox"])
))
plot <- ggplot(data_fox, aes(names, total, label = emoji, fill=names)) +
geom_bar(stat = "identity") +
geom_text(family = "EmojiOne", size = 6, vjust = -.5) +
scale_x_discrete(breaks = data_fox$names, labels = data_fox$emoji) +
ggtitle("Reacciones a las publicaciones de The Fox News Channel") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
xlab("Reacción")
ggplotly(plot)
```
</div>
### Reacciones LA Times
```{r}
data_laTimes <- tibble(names = names,
emoji = emojis,
total = c(#sum(data$shares_count[data$page_name=="laTimes"]),
#sum(data$likes_count[data$page_name=="laTimes"]),
sum(data$love_count[data$page_name=="laTimes"]),
sum(data$haha_count[data$page_name=="laTimes"]),
sum(data$sad_count[data$page_name=="laTimes"]),
sum(data$angry_count[data$page_name=="laTimes"]),
sum(data$thankful_count[data$page_name=="laTimes"])
))
plot <- ggplot(data_laTimes, aes(names, total, label = emoji, fill=names)) +
geom_bar(stat = "identity") +
geom_text(family = "EmojiOne", size = 6, vjust = -.5) +
scale_x_discrete(breaks = data_laTimes$names, labels = data_laTimes$emoji) +
ggtitle("Reacciones a las publicaciones de Los Angeles Times") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
xlab("Reacción")
ggplotly(plot)
```
Row
-----------------------------------------------------------------------
<div style="height:400px">
### Donut chart: Reacciones BBC
```{r, message=FALSE, warning=FALSE, results='hide'}
lessR::PieChart(x=names,
y =total,
data = data_bbc,
fill = "viridis",
hole = 0.5,
main = "BBC",
values_size=2,
labels_cex=2,
main_cex=2)
```
</div>
### Donut chart: Reacciones CNN
```{r, message=FALSE, warning=FALSE, results='hide'}
lessR::PieChart(x=names,
y =total,
data = data_cnn,
fill = "viridis",
hole = 0.5,
main = "CNN",
values_size=2,
labels_cex=2,
main_cex=2)
```
### Donut chart: Reacciones BBC
```{r, message=FALSE, warning=FALSE, results='hide'}
lessR::PieChart(x=names,
y =total,
data = data_bbc,
fill = "viridis",
hole = 0.5,
main = "BBC",
values_size=2,
labels_cex=2,
main_cex=2)
```
Row
-----------------------------------------------------------------------
<div style="height:400px">
### Donut chart: Reacciones The Fox
```{r, message=FALSE, warning=FALSE, results='hide'}
lessR::PieChart(x=names,
y =total,
data = data_fox,
fill = "viridis",
hole = 0.5,
main = "The Fox",
values_size=2,
labels_cex=2,
main_cex=2)
```
</div>
### Donut chart: Reacciones LA Times
```{r, message=FALSE, warning=FALSE, results='hide'}
lessR::PieChart(x=names,
y =total,
data = data_laTimes,
fill = "viridis",
hole = 0.5,
main = "LA Times",
values_size=2,
labels_cex=2,
main_cex=2)
```
Análisis de textos y sentimientos {data-icon="fa-user"}
===
Row {.tabset .tabset-fade}
--------
```{r}
get_bigram_filtered <- function(filter_word){
filtered_data <- data %>% dplyr::filter(stringr::str_detect(tolower(message), filter_word))
filtered_data <- filtered_data[filtered_data$post_type!="link" & filtered_data$post_type!="photo" ,]
filtered_data <- filtered_data[, c('page_name', 'name', 'message')]
data_bigram <- filtered_data %>% unnest_tokens(bigram, message, token = "ngrams", n = 2)
data_bigram <- data_bigram %>%
separate(bigram, c("word1", "word2"), sep = " ") %>%
filter(!word1 %in% stop_words$word,
!word2 %in% stop_words$word) %>%
count(page_name, word1, word2, sort = TRUE) %>%
unite("bigram", c(word1, word2), sep = " ") %>%
dplyr::filter(!stringr::str_detect(tolower(bigram), "http")) %>%
dplyr::filter(!stringr::str_detect(tolower(bigram), "bbc")) %>%
dplyr::filter(!stringr::str_detect(tolower(bigram), "fox"))
data_bigram <- by(data_bigram, data_bigram["page_name"], head, n=8)
data_bigram <- Reduce(rbind, data_bigram)
data_bigram
}
```
<div style="height:600px">
### Basketball
```{r}
data_bigram_sport <- get_bigram_filtered("basketball")
```
#### Filtered word: Basketball
```{r}
plot1 <- data_bigram_sport %>%
mutate(page_name = factor(page_name) %>% forcats::fct_rev()) %>%
ggplot(aes(drlib::reorder_within(bigram, n, page_name), n, fill = page_name)) +
geom_bar(stat = "identity", alpha = .8, show.legend = FALSE) +
drlib::scale_x_reordered() +
facet_wrap(~ page_name, ncol = 2, scales = "free") +
coord_flip() +
ylab("") +
xlab("") +
ggtitle("Pares de palabras con mayor ocurrencia por página filtrando por: Basketbal") +
theme(legend.position = 'none', plot.title = element_text(hjust = 0.5))
ggplotly(plot1)
```
</div>
<div style="height:600px">
### Image
```{r}
data_bigram_image <- get_bigram_filtered("image")
```
#### Filtered word: Image
```{r}
plot1 <- data_bigram_image %>%
mutate(page_name = factor(page_name) %>% forcats::fct_rev()) %>%
ggplot(aes(drlib::reorder_within(bigram, n, page_name), n, fill = page_name)) +
geom_bar(stat = "identity", alpha = .8, show.legend = FALSE) +
drlib::scale_x_reordered() +
facet_wrap(~ page_name, ncol = 2, scales = "free") +
coord_flip() +
ylab("") +
xlab("") +
ggtitle("Pares de palabras con mayor ocurrencia por página filtrando por: Image") +
theme(legend.position = 'none', plot.title = element_text(hjust = 0.5))
ggplotly(plot1)
```
</div>
Row {.tabset .tabset-fade}
--------
```{r}
get_wordcloud_data <- function(filter_word){
filtered_data <- data %>% dplyr::filter(stringr::str_detect(message, filter_word))
text <- filtered_data$message
docs <- tm::Corpus(VectorSource(text))
docs <- docs %>%
tm::tm_map(tm::removeNumbers) %>%
tm::tm_map(tm::removePunctuation) %>%
tm::tm_map(tm::stripWhitespace)
docs <- tm::tm_map(docs, tm::content_transformer(tolower))
docs <- tm::tm_map(docs, tm::removeWords, tm::stopwords("english"))
dtm <- tm::TermDocumentMatrix(docs)
matrix <- as.matrix(dtm)
words <- sort(rowSums(matrix),decreasing=TRUE)
df <- data.frame(word = names(words),freq=words)
df <- df %>% mutate_at(vars(word), function(x){gsub('[^ -~]', '', x)})
df <- df[1:500, ]
}
```
```{r}
data_wordcloud <- get_wordcloud_data("sport")
```
<div style="height:600px">
### Normal
#### Wordcloud normal
```{r message=FALSE}
wordcloud2(data=data_wordcloud, size=1.6, color='random-dark')
```
</div>
<div style="height:600px">
### Color
#### Wordcloud cambiando el color
```{r message=FALSE}
wordcloud2(data=data_wordcloud, size=1.6, color='random-light', backgroundColor="black")
```
</div >
<div style="height:600px; display: flex; justify-content: center;">
### Forma
#### Wordcloud cambiando la forma
```{r message=FALSE}
wordcloud2(data_wordcloud, size = 0.7, shape = 'star')
```
</div>
Row
------
<div style="height:600px">
### Análisis de sentimientos (AFINN)
```{r}
data_bigram_all <- data[, c('page_name', 'name', 'message')]
data_bigram_all <- data_bigram_all %>% unnest_tokens(bigram, message, token = "ngrams", n = 2)
AFINN <- get_sentiments("afinn")
nots <- data_bigram_all %>%
separate(bigram, c("word1", "word2"), sep = " ") %>%
filter(word1 == "not") %>%
inner_join(AFINN, by = c(word2 = "word")) %>%
count(word2, value, sort = TRUE)
plot <- nots %>%
mutate(contribution = n * value) %>%
arrange(desc(abs(contribution))) %>%
head(20) %>%
ggplot(aes(reorder(word2, contribution), n * value, fill = n * value > 0)) +
geom_bar(stat = "identity", show.legend = FALSE) +
xlab("Palabras precedidas por 'not'") +
ylab("Sentimiento según el número de ocurrencia") +
coord_flip() +
theme(legend.position = 'none')
ggplotly(plot)
```
</div>
Row
-----
<div style="height:800px">
### Red de correlaciones para el filtro Basketball
```{r}
filtered_data <- data %>% dplyr::filter(stringr::str_detect(tolower(message), "trump"))
filtered_data <- filtered_data[, c('page_name', 'name', 'message')]
data_bigram <- filtered_data %>% unnest_tokens(bigram, message, token = "ngrams", n = 2)
data_bigram <- data_bigram %>%
dplyr::filter(!stringr::str_detect(tolower(bigram), "http")) %>%
dplyr::filter(!stringr::str_detect(tolower(bigram), "bbc")) %>%
dplyr::filter(!stringr::str_detect(tolower(bigram), "fox"))
bigram_graph <- data_bigram %>%
separate(bigram, c("word1", "word2"), sep = " ") %>%
filter(!word1 %in% stop_words$word,
!word2 %in% stop_words$word) %>%
count(word1, word2, sort = TRUE) %>%
unite("bigram", c(word1, word2), sep = " ") %>%
filter(n > 50) %>%
graph_from_data_frame()
```
```{r}
set.seed(123)
a <- grid::arrow(type = "closed", length = unit(.15, "inches"))
plot <- ggraph(bigram_graph, layout = "fr") +
geom_edge_link() +
geom_node_point(color = vcount(bigram_graph) , size = 5) +
geom_node_text(aes(label = name), vjust = 1, hjust = 1) +
theme_void()
girafe(ggobj = plot, width_svg = 10, height_svg = 10,
options = list(opts_sizing(rescale = TRUE, width = .6)))
```
</div>